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Abstract. A great many systems can be modeled in the non-linear dynamical systems framework, 
as x = f(x ) + 5, (t), where /() is the potential function for the system, and £, is the excitation 
noise. Modeling the potential using a set of basis functions, we derive the posterior for the basis 
coefficients. A more challenging problem is to determine the set of basis functions that are required 
to model a particular system. We show that using the Bayesian Information Criteria (BIC) to 
rank models, and the beam search technique, that we can accurately determine the structure of 
simple non-linear dynamical system models, and the structure of the coupling between non-linear 
dynamical systems where the individual systems are known. This last case has important ecological 
applications. 
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INTRODUCTION 

Many natural systems can be modeled as noise-driven nonlinear dynamical systems. 
The classic example is the Lorenz system [1], intended as a simplified representation 
of climate variables. Other examples are the human cardiovascular system [2], where 
the heart-lung oscillations show physiologically important couplings, and predator-prey 
interactions, where one predator species feeds primarily on a single prey species, but 
will, when necessary, feed on others [3]. These last two are examples of an important 
class of problem where it is the coupling between subsystems which is of scientific 
importance. 

A non-linear dynamical system is typically modeled as evolving according to 

x-(f)=f(x) + €(r), (1) 

where f(x) is the system’s potential field, and £,(t) an additive noise process that drives 
the dynamics. The objects of inference are the field, f(x), and the intensity of the noise. 
A number of methods have been developed for these tasks. Deterministic methods fail to 
give accurate parameter estimates in the presence of noise [4]. A method was developed 
in [5] which was termed a maximum likelihood estimator, but was derived from an ad- 
hoc cost function. A Bayesian approach was formulated in [6], however this relied on 
MCMC, and so is too computationally intensive when the structure of f(x) must also 
be determined, as is the case here. A different estimation framework was suggested 
in [2, 7] that uses a path integral approach to derive a closed form expression for 
the parameter estimates. This method dramatically, speeds up the convergence of the 



inference process and makes it extremely stable to a large levels of dynamical noise E, (t). 
We use the parameter estimates obtained via this method together with the Bayesian 
Information Criteria (BIC) [8] to estimate the structure (in terms of basis functions) 
of f(x), using Beam Search, a non-complete search algorithm quite well known in the 
artificial intelligence community, but with little exposure in the Bayesian modeling field. 
Finally, we present a number of examples of learning the structure of unknown non- 
linear dynamical systems and the coupling between known systems. 

PARAMETER ESTIMATION FOR NON-LINEAR DYNAMICAL 

SYSTEMS 

An A-dimensional non-linear dynamical system evolves according to equation 1, and is 
observed at times t k to give a time-series & = {( t k ,y k ); k = 0 : K}. We assume that the 
observation noise is negligible compared with the excitation noise E, ( t ), hence y k = x k . 
Following the approach of [2, 7], we parameterize the potential field as 

f(x) = U(x)c = f(x;c) 

where U is an N x M matrix of basis functions and c an M-dimensional vector. In this 
way we convert the problem of estimating a nonlinear f(x) into a problem that is linear in 
the parameters. We assume the excitation noise, E, (/), is stationary, white and Gaussian, 
and hence characterized by its covariance matrix, D. 

The posterior distribution of interest is 

p( c,D|$0 °c p(^\c,T>)p(c)p(D) 

where independent priors have been assumed. We use a diffuse Gaussian prior on the 
elements of c and a uniform prior on the elements of D. The likelihood can be written in 
the form of a path integral over the random trajectories 

pm c,d)= [ x(k] P m%-)j?[x(t)]®x(t), 

2x(f i) 

where 3£ = {x(r)} and t, <C to < f K ty so that the likelihood does not depend on the 
initial and final states. The probability functional &[x(t)\ is determined by the properties 
of the excitation noise E,{t) [9, 10]. Under the assumption of negligible observation 
noise, the log-likelihood is 

|logp(^|c,D) = IndetDT-^ £ [tr4>(y*;c) 

K K k=0 

+(y*- f (yuc)) r D -1 (y*-f(y*;c))] +Aln(2;r/z) 

where y k = (y*+i -y k )/h and the matrix 4>(x) = {dfi(x;c)/dxj}. The posterior is thus 
p(c,D]^) = ^exp[-5(c,D|^jj where 

S(c,D|^) = ^p(D)-c r w(D) + ^c r S(D)c 



P(D) = h £ yjD 3 y^ + A:in(detD) 

k - 0 

w(D) = E-V + Z ^ 1 

/:=0 

S(D) - E pr 1 + /£ 1 lj[ D - 1 U (t 

k = 0 

where c pr and E., r are the mean and covariance of the prior on c, U* = U(y^) and the 
components of vector v(x) are: 


UjD ] y/t — ^v(y*) 


N 


VmW = £ 
«=1 


dU nm (x ) 
dx n 


m = 1 : M. 


We find the MAP estimates by alternately optimizing for D assuming c is known, and 
then for c assuming D is known. The maximizations can be done analytically, giving . 

. 6 W (c) = i £ [y^-U(y^)c]„[y A -- U(y*)c]J 

k = 0 
and 

c=E 1 (D)w(D). 

Further details are given in [7]. 


STRUCTURE ESTIMATION USING BEAM SEARCH AND BIC 


In general, which basis functions to include in U(x) will be unknown. Sometimes the 
type of basis function will be known from physical considerations (see later), but often it 
will be necessary to use a very general basis. In this case it is important to determine 
which of the basis terms are required to model the system. A very general basis is 
the Volterra expansion [11], where V(N,k) is an expansion up to order k of an N- 
dimensional vector. For example 

T(3,2) { 1 ]X\ jX2 A3 ) ,X\X2 Al A 3 ,X 2 A 2 A 3 ,.£ 3 } . 

A model, is then specified by which of the basis, terms are included. 

The formal Bayesian solution is to compute the integrated likelihood, or evidence, 
\ty). In practice, for large datasets, the BIC can provide a good approximation [ 8 ], 


BIC{JT) = Iogp(^!c, D) - I log AT* 


where N m is the number of terms in model . In either case the search is over l Nm 
models, and it rapidly becomes impossible to examine all models to find the model 





FIGURE 1. (left) Simulated trajectory from the Lorenz attractor, (right) Simulated trajectory from the 
system with the xf term included in the equation for x 3 .. 


with the highest BIC score. Instead, some non-exhaustive search must be used, with no 
guarantee of finding the optimum model. Instead, we hope to choose a search technique 
which returns a good enough model in an acceptable length of time. 


Beam Search 

Beam search is a heuristic where a set of models (the “beam”) are maintained at 
each step, and it is hoped that the diversity of models retained will allow the algorithm 
to escape from local minima. Applied to the non-linear dynamical system structure 
learning problem, the beam search algorithm is as follows: 

1. Choose the size, Nb, of the beam (the number of models to retain at each step). 

2. Index the models by a binary vector. 

3. Put models 1 . . . Nb on the beam and compute the BIC for each model. 

4. For each model on the beam 

• flip the next bit 

• compute the BIC 

5. Of the 2 Nb models now under consideration, keep the best Nb- 
■ 6. Goto 4, until all bits have been flipped 

Instead of examining 2 Nm models, this algorithm examines N m x N B . It is often useful to 
randomize the order in which the basis terms are added to each of the beam entries, to 
give a better exploration of the search space. 


Results for the Lorenz Attractor 

The Lorenz attractor is a three dimensional system governed by the equations 

X] = <7(X2— .*l) 

*2 = rx 1 - x 2 - XjX 3 

Jt3 = X\X2~ bx 3 




TABLE 1. Frequency of inclusion of each 
basis term in 100 repetitions. 
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where cr = 10, r = 28 and b — 8/3. A sample trajectory is shown in figure 1 for h — 0.005 
and the variance of the excitation noise being 0.001. A suitable basis for this system 
is the 7(3,2) basis discussed above. This has 10 terms for each of the 3 dimensions, 
resulting in a search over 2 30 ~ 10 9 models. Using beam search with a beam of size 
16 results in fewer than 800 models being examined. The algorithm was ran 100 times, 
using 5000 data points generated from the Lorenz system with a random initial start 
point, after 2000 data points were ignored to ensure that the trajectory had converged 
to the attractor. The frequency of inclusion of each term in the model is shown in table 
1. Apart from the x-$ term in the equation for A3, which was included 94% of the time, 
the other correct terms were always included. In the equation for A3, the term xf was 
also always included incorrectly. Typically the value of the coefficient for this term was 
-0.04, which is an order of magnitude smaller than the smallest of the other coefficients. 

Figure 1 (right) shows data simulated from the system with the xf term included in 
the equation for A3. Even with the extra term it is in good qualitative agreement with the 
data in figure 1 (left). 


Coupled Lorenz Attractors 

In many applications the system of interest is constructed by coupling together a 
number of simpler systems, where each of the simpler systems is either well known 
or can be studied in isolation. In these cases it is the structure of the coupling between 
the systems that is of interest. 

In [12] the authors studied a system of three coupled Lorenz attractors. Inspired by 
their system, we modify the second equation in each oscillator to be 


A2 = rxi — X2 — X1X3 + cxi 

where Ai is the xj component from the previous oscillator (in a circularly coupled man- 
ner). Figure 2 shows a simulation of this system with c = 28. To model this system with 
no knowledge of the underlying Lorenz attractors would require the use of the 7 (9, 2) 
basis, which has 55 terms for each dimension, a total of 495. A search over a space 
this size is infeasible. Instead, we search only over die coupling between oscillators, 
including in all models the terms known to be included, and excluding terms that are 
completely within a single oscillator that are not in the Lorenz equations. This reduces 
the search space down to 54 terms. 



FIGURE 2. Simulation of the system of three coupled Lorenz oscillators 
The estimated equations using the data in figure 2, and a beam of size 128 are 

xi 

X 2 

X 3 

X^ — X 4. j X5 

■*5 = * 1 , (* 2 ),-* 4 , * 5 ,-* 4*6 

*6 = X 6 ,X 4 X 5 ,(x 9 ) 

XJ = X7 , .X '8 

X 8 — (-^1 ) > (^ 2 ) > *^4 5 (*^5 ) 7 7 -^8 > X7X9 
X 9 = (x 3 ), (*5)1 Xg, X 7 X S 

where the terms not in the original equations are in parentheses. 

PREDATOR-PREY SYSTEMS 

The structure of predator-prey interactions in natural ecosystems is of great ecological 
interest. The structure can reveal which prey species are chosen preferentially by which 
predator species, and changes in this structure can be indicative of changes in the 
ecosystem due to external factors. 


= -*1>*2 

= X2,X2,X[X3, (*4 ), .*7 

= X 3 ,XiX 2 ,(x 9 ) 




FIGURE 3. Synthetic data from a system describing three coupled predator (bottom) - prey (top) pairs 


A multidimensional 2-trope predator-prey system can be modeled as 
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where P { are the predators and //, the prey. We assume the same number of predator 
species as prey species* and that predator P l preys predominandy on a single prey species 
Hi. The remaining interactions, specified by the sparse matrices a,y and are weak. 
Also a n = (5 a — 0. For this system we have also the constraint that for realism, if = 0 
then so does if species Pi preys on Hi, then P shows some increase in quality of life. 

Equations 2 and 3 are not in the form of equation 1, so we make the transformations 
Xi — In Hi and Y) = lnP„ giving 


dXj 

dt 

dYi 

dt 


= at — h/expA; 


C\ exp Yi t— Oti j exp Yj 
1 + exp Xi n 1 + exp Xi 


+m 


= - di + ei exp Xi + £ Pu ex P X J + Ci (0 
m 


The Vol terra basis used in the earlier examples is not suitable here. Instead, we construct 
a basis of terms of the form expX- and Figure 3 shows synthetic data from a 

system of three coupled predator-prey pairs. Table 2 shows the system of equations used 
to generate this data, and also the frequency of inclusion of each term in the estimated 
model for ten data sets generated with different initial conditions. None of the terms not 
represented in table 2 was included in any of the estimated models. 





TABLE 2. Frequency of inclusion of each basis term in 10 repe- 
titions. 
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CONCLUSIONS 

We have presented Bayesian inference methods for the potential of a non-linear dynam- 
ical system when the potential is described in terms of basis functions. We have shown 
that the search method known as “beam search” can be applied with BIC as the scor- 
ing function to the problem of determining which of the basis terms are required for a 
particular system. We have presented examples representative of an important class of 
system, and demonstrated reasonable performance in those cases. 
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